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Abstract: One of the most significant challenges in the neuroscience community is to understand how the human brain works. Recent 
progress in neuroimaging techniques have validated that it is possible to decode a person's thoughts, memories, and emotions via func- 





tional magnetic resonance imaging (i.e., fMRI) since it can measure the neural activation of human brains with satisfied spatiotemporal 
resolutions. However, the unprecedented scale and complexity of the {MRI data have presented critical computational bottlenecks re- 
quiring new scientific analytic tools. Given the increasingly important role of machine learning in neuroscience, a great many machine 
learning algorithms are presented to analyze brain activities from the fMRI data. In this paper, we mainly provide a comprehensive and 
up-to-date review of machine learning methods for analyzing neural activities with the following three aspects, i.e., brain image function- 
al alignment, brain activity pattern analysis, and visual stimuli reconstruction. In addition, online resources and open research problems 
on brain pattern analysis are also provided for the convenience of future research. 
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1 Introduction 


One of the most significant challenges in the fields of 
neuroscience and machine learning is comprehending how 
the human brain works. As the provenance of human 
memory, emotion and thoughts, a better comprehension 





of the brain will expedite the rapid development of soci- 
ety, including science, medicine, education, etc.!4~3! In or- 
der to measure neural activities, different modalities of 
measurement can be utilized, including event-related op- 
tical signals (EROS), positron emission tomography 
(PET), single-photon emission computed tomography 
(SPECT), near-infrared spectroscopy (NIRS), magnetoen- 
cephalography (MEG), electrocorticography (ECoG), 
electroencephalography (EEG), and functional magnetic 
resonance imaging (fMRI). Among all of the above ima- 
ging biomarkers, {MRI is one non-invasive technique for 
probing the neurobiological substrates of various cognit- 
ive functions that can provide indirect estimation of brain 
activity and measure the metabolic changes in blood 
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flow]. Another advantage of fMRI is that it can 
provide unprecedented spatiotemporal resolution without 
known side effects, which intuitively can provide more ac- 
curate information for the analysis of neural activities. 
Based on the fMRI images, many machine learning 
models are applied to analyse the visual and subjective 
contents of human brains!* 1°. Generally, the machine 
learning-based methods aim to build a mathematical model 
based on the fMRI sample data, namely the training 
data, in order to make predictions or decisions without 
being explicitly programmed to perform the neural activ- 
ity prediction task on the testing set. For instance, Kam- 
itani and Tong!!! applied a linear regression model to 
classify brain states and found that the cognitive trials of 
subjects could be reliably predicted via ensemble fMRI 
signals recorded in early visual areas. Kay et al.!!2! pro- 
posed a brain decoding method based on quantitative re- 
ceptive field models, which learn a representation of the 
relationship between the stimuli images and the evoked 
fMRI data in early visual areas. By noticing the propor- 
tion of voxels that convey the discriminative information 
is small compared to the total number of measured 
voxels, Martino et al.!!°] applied a recursive feature elim- 
ination (RFE) algorithm to eliminate irrelevant voxels 
and estimate informative spatial patterns. As another 
work, Yamashita et al.!'4! proposed a linear classification 
algorithm called sparse logistic regression (SLR), that can 
automatically select relevant voxels as well as estimate 
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their weight parameters for brain state estimation. 
Although much progress has been achieved, given the 
data sets for the analysis of brain activities, major com- 
putational and statistical challenges have arisen to real- 
ize the full unprecedented scale and complexity of the 
valuable {MRI data. Overcoming these challenges has be- 
come a major and active research topic in the fields of 
statistical and machine learning. Here, we summarize and 
list the main challenges for brain pattern analysis as fol- 
lows: First of all, a key component of {MRI research will 
be the use of multi-subject datasets. However, both ana- 
tomical structure and functional topography (brain activ- 
ity patterns) vary across subjects!!517], and thus the au- 
thentic functional and anatomical alignments among dif- 
ferent subjects’ neural activities should be addressed be- 
fore the development of the classification models. 
Secondly, the dimensionality of {MRI datasets is always 
high with redundant noisel!®: 19]. For some specific brain 
research experiments, such as visual or auditory stimula- 
tion, only a part of the brain area is activated in these 
tasks. Selecting key brain areas is a prerequisite for accur- 
ate brain research. Last but not the least, although re- 
searchers have successfully improved the classification 
performance for identifying brain activity patterns, the 
reconstruction of visual stimuli via brain images is still a 
challenging taskl®: 2°. Compared with the classification 
tasks, reconstruction of visual images can provide more 
detailed information for understanding human minds. In 
recent years, some reviews!?!-23] reviewed the mechanisms 
of brain encoding and decoding as well as common and 
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classic methods. ‘These reviews not only summarized the 
up-to-date methods, but also presented the challenges in 
the field of brain decoding and neuroscience. In view of 
the above challenges, the majority of this review will be 
devoted to the discussion of the machine learning al- 
gorithms for solving the following four types of problems 
in the field of brain decoding, and we show the flowchart 
of our paper in Fig. 1. 

Firstly, in Section 2, we will examine the problem of 
functional alignment for fMRI analysis across subject, 
which is a pre-processing step for the brain decoding ana- 
lysis that takes into account variability between subjects. 





Since most of the research reviewed here belongs to this 
category, we will review a few fundamental brain align- 
ment strategies in Section 2 including linear functional 
alignment, non-linear functional alignment, etc. Secondly, 
in Section 3, we will explore the problems of multivariate 
pattern classification and representation similarity analys- 
is that predict the neural patterns with distinctive stim- 
uli, as well as evaluate the similarities (or distances) 
between different cognitive tasks. Thirdly, in Section 4, 
we will review the methods for brain image reconstruc- 
tion that generate the stimuli image via corresponding 
fMRI signals. Finally, online resources and open research 
problems on brain pattern analysis are also provided in 
Section 5. 


2 Functional alignment 


One of the challenges in the field of brain decoding is 
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the {MRI analysis of multi-subject!!: 9 161. Basically, multi- 
subject {MRI data analysis is critical for the general eval- 





uation of the research findings across subjects. However, 





due to the heterogeneous patterns in multi-subject data- 
sets, the {MRI data collected from different subjects must 
be aligned into a common space in multi-subject cognit- 





ive analysis to overcome the between-subject  vari- 
ability!!8]. From the perspective of machine learning, we 
can regard the alignment problem as a multi-view repres- 
entation learning probleml!! !3!. Herein, the assumption of 
the alignment problem is that there is some common in- 
formation across subjects, and the alignment of the data 
means extracting this common information. Generally, 
there are mainly two kinds of alignment methods, one is 
anatomical alignment and the other is functional align- 
ment. The most popular method for fMRI image align- 
ment is the anatomical alignment, which is based on ana- 
tomical features via structural MRI images, e.g., Talair- 
ach alignment!?4l, or Montreal neurological institute 
(MNI)?°: 26]. However, these anatomical based alignment 
methods cannot significantly improve the accuracy since 
they are insufficient to address the variability in function- 
al topography of brains. The goal of functional alignment, 
on the other hand, is to precisely align the fMRI re- 
sponse space across the subjects. In other words, it aims 
at investigating a common space, where we maximize the 
within-class stimuli correlation and minimize the correla- 
tion between the between-class stimuli to ensure that the 
prominent distances exist in between-class neural activit- 
ies compared with each other!!5: 14, 

During the past decade, some research has combined 
both anatomical and functional features for fMRI func- 
tional alignment. For example, Conroy et al.!?”] proposed 
an alignment method that uses cortex warping to maxim- 
ize the inter-subject pattern alignment. Similarly, cortic- 
al warping was used!?8] to maximize the cross-subject 
inter-subject correlation (ISC). As another research 
project focused on the maximization of ISC, Dmochowski 
et al.?9l aggregated the data collected from different sub- 
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jects into a common matrix that can take cross-subject 
variability into consideration. Further, Michael et al.|4l 
proposed group independent component analysis and in- 
dependent vector analysis for the functional alignment of 
resting-state fMRI (rs-f{MRI). The algorithm did not as- 
sume the simultaneity of stimuli, so it concatenated data 
along the temporal dimension, which means spatial con- 
sistency, and learned the components of spatial independ- 
ence. Based on the above consideration, a famous align- 
ment method, which is called hyperalignment (HA), was 
proposed by Haxby et al.!! to align the neural activity 
patterns across subjects onto a common space with high 
dimensions. Hyperalignment is a functional alignment 
method which is uncorrelated with anatomical features. 
As is shown in Fig.2, a basic hypothesis of the original 
proposed hyperalignment model is that it is a common 
template with noisy rotations. HA uses Procrustes trans- 
form!®°] to rotate the coordinate axis of the subject's rep- 
resentation space, in order to align the response vectors 
from different subjects. ‘The representation space of differ- 
ent subjects is aligned iteratively and finally a common 
space could be generated for all the subjects. 





Followed by the work of Haxby et al.l!], many im- 
proved alignment methods were proposed to achieve bet- 
ter performance. We can use different criteria to divide 
these methods. For example, it can be divided into super- 
vised, semi-supervised and unsupervised functional align- 
ment methods according to whether the label informa- 
tion is available. Or it can be divided into linear models 
and non-linear models according to the way of deriving 
the transform matrix. In this paper, we will introduce 
several classic and state-of-the-art functional alignment 
methods via the second division strategy. 


2.1 Linear transformation methods for 
functional alignment 


Let the matrices Xi:, € R’*” record the data of sub- 
jects. Here, n, t and v represent the number of subjects, 


Common space 





Fig.2 Mapping different subjects into common space via Hyperalignment 
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the number of TRs (time of repetition) and the number 
of voxels, respectively. Mathematically, HA can be formu- 
lated through the framework of canonical correlation ana- 


lysis (CCA)!I: 
min Does |XiR; — X;R;\\+ 
s.t. Ri Xi X,R,p =I, bk =1,2,---,n. (1) 


Based on Haxby et al.'s studyl!], some studies have 
proposed several improved methods to ameliorate the per- 
formance of hyperalignment. Xu et al.!3!] proposed a regu- 
larized hyperaligment (RHA) method, which iteratively 
found the optimal regularization parameters by using the 
expectation-maximization (EM) algorithm. RHA proved 
that the weights of singular vectors in each normalized 
dataset are controlled by the relevant regularization para- 
meters, and the classification accuracy can be improved 
by adjusting the regularization parameters. RHA verified 
that the weights of the singular vectors in each standard- 
ized dataset are controlled by the relevant regularization 
parameters, and the classification accuracy can be im- 
proved by adjusting regularization parameters. Chen et 
al.l32] proposed singular value decomposition hyperalign- 
ment (SVDHA) and used joint singular value decomposi- 
tion to decompose the response matrix. In this way, they 
reduced the dimension of fMRI for the first time. After 
that, HA was used to make the subjects align in a new 
feature space with lower dimensions, which can reduce 
calculation time while retaining classification accuracy. 
Furthermore, a shared response model (SRM) was _pro- 
posed by Chen et al.!83! as another functional alignment 
method. Indeed, we can think of SRM as a variant of the 
probability principal component analysis (PCA), and the 
specific way of converting is to impose orthogonal con- 
straints on the loading matrix. One of the key attributes 
of SRM is the dimensional reduction mechanism, which 
reduces the dimensions of the shared feature space. In 
other studies, Sui et al.[4 35! applied multimodal CCA 
and independent component analysis (ICA) methods to 
multimodal data. In this way, specific and shared vari- 
ance associations across multimodal data can be_ identi- 
fied. The above studies are all based on unsupervised ma- 
chine learning methods. However, in visual stimulation 
tasks, we can also collect supervision information such as 
stimulus image labels. Therefore, Yousefnezhad and 
Zhang!!*] proposed a supervised HA method named local 
discriminant hyperalignment (LDHA), which brings the 
concept of linear discriminant analysis (LDA) into CCA 
that can improve the HA performance of the unsuper- 
vised methods. 


2.2 Non-linear transformation methods for 
functional alignment 


All the HA methods mentioned above attempt to find 


the transformation matrix of each subject by solving the 
linear model, and project the response matrices of differ- 
ent subjects into a common space. However, there are al- 
ways nonlinearity and high-dimensionality problems in 





the real world. Therefore, several nonlinear HA methods 
were proposed for the alignment of different subjects. For 
example, Lorbert and Ramadgel! proposed a non-linear 
method which is called kernel hyperalignment (KHA) to 
do the non-linear transformation in the embedded kernel 





space. KHA can simultaneously solve the voxel and fea- 
tures expansion problems, and the difficulty of HA shifts 
from the limitation of the number of voxels to the num- 
ber of subjects. Chen et al.!86! developed a convolutional 
auto-encoder (CAE) for functional alignment on whole- 
brain fMRI data. As another nonlinear HA method, CAE 
firstly reconstructed SRM into a multi-view autoencoder. 
Then, CAE applied the standard searchlight (SL) to im- 
prove the stability and robustness of the cognitive classi- 
fication model. 

With the fast development of the deep neural net- 
works, its powerful fitting ability provides another effect- 
ive way of transformation for the nonlinear HA method. 
Yousefnezhad and Zhang!!”] proposed a deep hyperalign- 





ment (DHA) method as an unsupervised kernel model. As 
can be seen from Fig.3, DHA used deep networks, i.e., 
multiple stacked layers of nonlinear transformation, as 
the kernel function, which can be solved via rank-m SVD 
and stochastic gradient descent (SGD). DHA not only 
solved the nonlinear problems and _ high-dimensional 
transformation, but also performed well on classification 
tasks. 

Recently, a cross-subject graph was used by Li et 
al.l5] to describe the similarities or dissimilarities among 
different subjects for the HA on fMRI datasets. One ad- 
vantage of this method is that a new optimization al- 
gorithm based on kernels was used for nonlinear feature 





extraction. Here, we report the alignment results of sever- 
al existing methods in Table 1, the datasets used in these 
methods are also shown in ‘Table 1. More information 





about the presented dataset can be found in Section 5.1. 
It is worth noting that figures in parentheses indicate the 
number of categories in different datasets (ROI: region-of- 
interest; WB: whole-brain; PMC: post medial cortex). 


3 Brain activity pattern analysis 


After the functional alignment for multi-subject fMRI 
datasets, a common space across subjects was generated. 





In this new representation space, the brain activities of 





different subjects were represented via response matrices, 
in which each element denotes the activity value of a 
voxel or ROI. Brain activity feature analysis techniques 
were proposed to obtain the most discriminative features 
from high-dimensional, sparse, and noisy response 
matrices, which is also the essential prerequisite for the 
following classification or reconstruction task. 

Research on brain activity feature analysis could be 
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Fig.3 Deep hyperalignment!7] 


Table 1 Performance of HA methods in post-alignment 
classification (%) 


Methods 


HAL 


RHABU 


KHA)! 


SRMB) 


CAEBS 


LDHA[ 6] 


DHAL7! 


Graph-based 


decoding 
model 
(GDM)I15! 
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Datasets 


RaiderDS105 


[37] 


Raider(7) 


Sherlock-movie 
Raider-movie 
Forrest-audio 

Audiobook 
Audiobook 
Sherlock-movie 


Movie and recall 
(WB) 
Movie and recall 
(PMC) 


DS005(2) 
DS105ROI(8) 
DS107ROI(4) 

DS117(2) 

DS005(2) 
DS105ROI(8) 
DS107ROI(4) 

DS116(2) 

DS117(2) 
DS105WB(8) 
DS105ROI(8) 

DS011(2) 

DS203(4) 

DS001(4) 

Raider(7) 


Accuracy (%) 


AUC (%) 


70.6+2.6 for movie time segments 


63.9+2.2 for faces and objects 
68.0+2.8 for animal species 


33.522 10 
79.4+1.0 
46.5+1.0 
74.6£1.0 
18.3£1.0 
42.31.0 


11.8+1.0 


9.4+1.0 


94.32+0.16 
54.04-£0.09 
74.73£0.19 
95.070.27 
97.92+0.82 
60.39-£0.68 
73.05-£0.63 
90.28-£0.71 
97.99+0.94 
60.68-£5.23 
62.22+4.23 
92.49+2.24 
82.47+1.45 
62.68-£1.53 
64.52+3.28 


80.85 (average performance 21% 
above that of basic hyperalignment ) 


48.93 (Average) for ventral temporal 
36.34 (Average) for entire cortex 


93.25-£0.92 
53.86-£0.17 
72.030.37 
94.23+0.94 
96.910.82 
59.570.32 
70.23-£0.92 
89.93-£0.24 
96.13-0.32 


traced back to 2001. Haxby et al.!°8! found that when dif- 
ferent images are presented to a subject, different cat- 
egories of visual stimuli induced different fMRI response 





patterns. Following their work, several brain activity pat- 
tern analysis methods!®9#2] have been proposed during the 
last two decades. A key concept of brain encoding and 
decoding is the representation of high-dimensional vector 
spaces. Neural responses, also known as patterns of brain 
activity, exist in vector form in neural representation 
spaces. Patterns of brain activity are distributed both 
spatially and temporally!®%!. Features, known as elements, 





in these patterns are represented as local measurements 
of brain activity, and each local measurement is ex- 
pressed as one dimension in the space. Currently, there 
are numerous techniques that can be used to work with 
task-based fMRI datasets. These techniques, including 
multi-voxel pattern analysis (MVPA) and representation 
similarity analysis (RSA), can effectively extract and de- 
code brain activity patterns. In this section, we will intro- 





duce the above two high-dimensional feature analysis 
techniques. 


3.1 Multi-voxel pattern analysis 


In the early days of {MRI data analysis, univariate 
methods were mainly used for brain activity pattern re- 
cognition. In most of these univariate methods, a general 
linear model (GLM)/**] was used to estimate each voxel in 
the brain separately, and the analysis results were shown 
in an image of model parameters or derived statistics!40l. 
However, with the development of research techniques, 
researchers found that the univariate method is not suffi- 
cient to support the analysis of {MRI data. In this case, 
multivariate analysis received more and more attention. 

Due to the high spatial resolution of the {MRI and the 
particularity of the imaging method, the {MRI data have 
the features of high dimensionality and low signal-to- 
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noise ratio. The traditional univariate method treats each 
voxel as an independent feature, ignoring the correlation 
between features, which makes it difficult to detect spa- 
tial patterns!“°]. Multivariate pattern analysis, as an al- 
ternative to the traditional univariate method, can more 
accurately detect the activation distribution of the brain 





and decode the cognitive state. Therefore, multivariate 
pattern (MVP) analysis is widely used in many studies in 
the field of neuroimaging. 

Information is encoded into brain activity patterns. 





This information comes from people's experience, or the 
thinking and imagination of the world. MVP analysis is a 
modern approach drawn from computational advances in 
the last two decades!” 4. As one of the early studies, 
Haxby et al.l! illustrated how cognitive states can be dis- 
tinguished by multi-voxel brain activity patterns. They 
proposed a new classifier basing on split-half correla- 





tion!]. The experimental results showed that a distribu- 
tion representation of eight categories, such as bottles, 
faces, houses, etc., is contained in the ventral-temporal 
(VT) cortex. Furthermore, these categories could be de- 
coded from human brain activity!42). 

Recently, sparsity learning methods have also been 
used to select the most discriminative voxels for brain 
activity pattern analysisl®> 14 44], Specifically, Yamashita 
et al.l!4] proposed a sparse logistic regression (SLR) meth- 
od, which was a linear model used for feature selection. 
The SLR was applied to automatically choose the most 
discriminative voxels in the brain and estimate the para- 
meter weight for cognitive state identification. Moreover, 
Ryali et al.!“4) proposed a logistic regression-based meth- 
od as well as a combination of /; and /znorm regulariza- 
tions to select discriminant brain regions across multiple 
conditions or groups. Grosenick et al.!45!] developed a 
graph-constrained elastic-net (GraphNet) based whole- 
brain regression and classification method that can auto- 





matically provide interpretable coefficient maps. In addi- 
tion, Yousefnezhad and Zhang!*!! proposed an MVP ana- 
lysis method based on the AdaBoost algorithm, which 
was named imbalance AdaBoost binary classification 
(IABC). IABC converted an imbalance MVP analysis 
problem to a set of balance problems to improve the 
fMRI analysis performance significantly. Meel et al.!4¢ 
used MVP and functional connectivity analysis methods 
to study the (vertical) symmetrical representation of the 
regions of the ventral visual stream. Wen et al.!5! pro- 
posed a feature selection method based on group sparse 
Bayesian logistic regression (GSBLR), which was applied 
to select the most relevant voxels for binary brain decod- 
ing. The grouped automatic relevance determination 
(GARD) was used in this model as prior to set the para- 
meters, which is in concordance with the group sparsity 
property of the {MRI data. 


3.2 Representation similarity analysis 


RSA is another well-known method that is widely 


used in the field of brain activity pattern analysis, which 





is used to evaluate the similarities between various cog- 
nitive states|4” 48], In a visual stimuli task, {MRI signals of 
subjects are acquired when watching different categories 
of images or videos. In a perceptual stimuli task, differ- 
ent categories of stimuli can evoke corresponding activity 
patterns in the brain of a subject. Then, RSA will be 
used to calculate the similarities between various cognit- 








ive states. This process will generate the representational 
similarity matrix (RSM) that encodes the similarity 
structure of different cognitive tasks. Fig.4 shows the 
computational steps for the derivation of the RSM. In the 
RSM, each block represents a correlation distance 
between the activity patterns of a pair of stimuli (ie., 
conditions in the experiment). The diagonal elements of 





the RSM are equal to 1. ‘The value of matrix's non-diag- 
onal elements represents the similarity of brain's re- 
sponses to two different stimuli. The larger the value, the 
higher the similarity, vice versa. 


Visual Subject — Brain activity RSM 
stimuli patterns 





Fig.4 Computation of RSM 


Classic RSA is mainly based on traditional linear 
methods, e.g., GLM(43], ordinary least squares (OLS)I47, 
etc. In fact, we can regard RSA as a multi-task regres- 
sion problem. Kriegeskorte et al.!47) used the ordinary 
least squares method to fit the linear model of the time 
frame for each voxel to measure the spatial activity pat- 
terns caused in each condition. This linear model in- 
cludes a hemodynamic response predictor for each case, 
as well as an optional further predictor for modeling hu- 





man factors, such as trends, head movement effects, and 
baseline shifts between measurement runs. RSAI/4®! as- 
sumes that the brain activity patterns are related to stim- 
uli events, which can be formulated as 


YO = xO@p® 4 (2) 


where YO =1),,-CR 1a mo Tila y, 
denotes the fMRI time series from the 72-th subject, 7’ is 
the number of repetition time (TR) and V is the number 
of voxels of brain. The design matrix is denoted by X“ = 
fame} € R'*?, 1<m<T, 1<k <P. The design matrix 
X™ can be obtained by the convolution of the stimuli 
time series with a typical hemodynamic response function 
(HRF). Here, P denotes the number of the categories of 
stimuli, B® ={8.;}E€R?*”, Be; ER, 1<k<P, 1<j<V, 
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denotes the estimated regression matrix, and {;; is an 
amplitude reflecting the response of the j-th voxel to the 
k-th stimuli. GLM is based on a linear model and it 
cannot achieve satisfactory results since the represen- 








tation matrix is usually a wide matrix, which means that 
the voxel account is far more than the time points in 
fMRI dataset. Moreover, this method makes it difficult to 
convert data into a matrix!#!]. Also, the method's stability 
and robustness will decrease when the value of signal-to- 
noise (SNR) reduces!?]. Further, GLM and OLS will face 
the problem of overfitting. Most of the existing studies 
avoid overfitting by adding the regularization terms. For 
instance, the least absolute shrinkage and _ selection 





operator (LASSO)|*9! was proposed to solve the regression 
problem by using /j-norm, whereas /z-norm was used in 
the ridge regression®°] method to address the afore- 
mentioned problem. The elastic netl!J, as a modified 
model, was developed to address the above issues via 
combining /; and /z norms. 

On the other hand, a concept called searchlight was 
introduced by researchers as an alternative method of re- 
gion-of-interest (ROI) based fMRI analysis. SL imple- 
ments MVP analysis on sphere-shaped groups of voxels 
centered on each voxel one by onel®l. As we mentioned 
before, due to the high spatial resolution of fMRI data, 
the whole-brain datasets have high dimensionality. In the 
past, when using RSA methods, it was difficult to con- 
vert the data into a matrix and we could not avoid the 
inverse of the voxel matrix. In addition, when the num- 
ber of voxels is too large, RSA optimization is also 
plagued by high-dimension data. Fortunately, compared 
with traditional RSA algorithms, modern RSA algori- 
thms can optimize the solution process®2]. Su et al.|53 
proposed an RSA method that uses searchlight techno- 
logy for EMEG (a combination of MEG and EEG). This 
method directly implemented the MVP analysis of in- 
formation flow in the human brain and the spatial and 
temporal identification of fine-grained dynamic neural 


Stimuli 


Image latent 
space 





calculations. As an extended application, the SL-based 
RSA method can also be applied for the structure analys- 
is in the ethical violation spacel®4l. 

In short, RSA provides researchers with a new per- 
spective to compare different genomic representation 
across different subjects, different ROI from one subject, 
different modalities of measurement, and even different 
species. Since similarity structures can be estimated from 
imaging data even without coding models, RSA cannot 
only be used for model testing but also for exploratory re- 
searchi48]. RSA is also initially used to study visual rep- 
resentations|43 55; 56], semantic representations2) 7] and 
lexical representations>3!. Last but not the least, RSA can 
also be applied to reveal the representations of social net- 
works|®8, 59], 


4 Visual stimuli reconstruction 


Like the classification and regression task in machine 
learning, the purpose of brain decoding is to analyze the 
subject’s brain activity patterns to perform the task of 
visual stimuli identification or reconstructing the stimuli 
details. In recent years, quite a lot of studies have been 
made for the classification of brain activity patterns!) 38; 45], 
However, the reconstruction of brain images is still a 
challenging task. A general conceptual framework for 
visual stimuli reconstruction is shown in Fig.5, which can 
be regarded as a cross-modal reconstruction (The green 
line represents image reconstruction while the blue line 
denotes the fMRI). Visual stimuli reconstruction focuses 
on acquiring the relevant features between the stimuli im- 





ages and fMRI in order to generate the stimuli images via 
the corresponding fMRI signal. 

Many researchers have made preliminary explorations 
in the field of visual stimuli reconstruction. As an early 
exploratory study, Thirion et al.! used rotating Gabors 
to reconstruct dot patterns from stimuli and imagery. 
They predicted the visual stimuli of both real and imagin- 


Reconstructed image 
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Fig.5 General conceptual framework for visual stimulation reconstruction 
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ary scenes via the evoked brain activities, which was eli- 
cited from the visual cortex. Moreover, Miyawaki et al.|°!) 
firstly asked the volunteers to watch a lot of flashing 
checker board images as visual stimuli and recorded the 
evoked brain activity patterns of these stimuli in the 
early visual cortex (V1/V2/V3) and then built a sparse 
multi-scale multinomial logistic regression (SMLR) local 
decoder model for visual stimuli reconstruction. The ex- 
perimental results showed that this method provided a 
new way to interpret the visual perception of the brain. 
In recent years, many reconstruction methods have 
been proposed for visual stimuli reconstruction. These 
methods can be divided into traditional machine learning 
methods and the latest deep network framework. Among 
the traditional machine learning methods, the Bayesian 
model is the most common one. In this paper, we will re- 
view the recent progress with the following two aspects, 
i.e., the Bayesian-based reconstruction models and deep 
generation model-based reconstruction methods. 


4.1 Bayesian-based reconstruction model 


Inspired by the work of Miyawaki et al.|®], some re- 
construction models based on Bayesian models are pro- 
posed to explore the correlations among the signals recor- 
ded in fMRI that can reflect the features of correspond- 
ing stimuli images. For example, Naselaris et al.!°2! pro- 
posed a joint model that combines structural and semant- 
ic features of brain activity patterns. And a Bayesian 
framework is used here to infer the stimuli images from a 
large-scale dataset via the evoked brain activities. 
Nishimoto et al.!°3! used a Bayesian decoding framework 





for movie scene reconstruction from the given blood-oxy- 





gen-level-dependent (BOLD) signals. A motion-energy en- 
coding model is proposed by the authors that largely 
overcomes the limitation of tardiness of BOLD signals 
measured via fMRI. Further, a model called Bayesian ca- 
nonical correlation analysis (BCCA) was proposed by 
Fujiwara et al.l%4] to automatically learn image bases. 
CCA was used to construct an invertible mapping based 
on the Bayesian model. Zhan et al.l!!°! proposed a recon- 
struction method based on a support vector machine 
(SVM) and Bayesian classifier followed by ICA to im- 
prove the efficiency of feature extraction and reconstruc- 





tion performance. Cowen et al.!! used PCA to transform 
human face stimuli into a new feature space, and then es- 
tablished the relationship between new features and fMRI 








signals, and realized reconstruction of human face stimuli 
for the first time. Du et al.|®*l proposed a Bayesian-based 
reconstruction method that derives missing latent vari- 
ables by Bayesian inference. The joint generative model 
of external stimuli and brain activities they proposed can 
not only extract non-linear features of the stimuli images, 
but also capture the correlation among brain activities. 
The reconstruction models based on the Bayesian frame- 
work aims to find the relationship between the visual 


stimuli and the corresponding fMRI signals, and estab- 
lish a linear mapping between them to achieve the task of 
image reconstruction. However, the linear mapping often 
cannot truly reflect the relationship between the two 
cross-modal data, and the reconstruction results obtained 
are often coarse-grained, making it difficult to describe 
the details of the images. 

4.2 Deep network-based reconstruction 
model 





In the last decade, deep learning has drawn signific- 
ant attentions for its powerful fitting and generating cap- 
abilities. Variational autoencoder (VAE)!® and generat- 
ive adversarial network (GAN)|®! are two of the most 
popular approaches. VAE describes potential spatial ob- 
servations in a probabilistic manner. ‘Therefore, instead of 
constructing an encoder that outputs a single value to de- 





scribe each latent state attribute, we use an encoder to 
describe the probability distribution of each latent attrib- 
ute. By sampling from the underlying space, we can use 
the network of decoders to form a generative model that 





can create new data that are like the observations of the 
training data. In other words, we could sample from the 
prior distribution p(z), and assume that it follows a unit 
Gaussian distribution. Recently, Du et al.!8! proposed a 
deep generative multi-view model (DGMM) for stimuli 
image reconstruction from the evoked brain activity pat- 
terns. DGMM can be regarded as a nonlinear extension of 
BCCA by combining image generation models with 
Bayesian inferences to accomplish reconstruction tasks. 

As other deep learning approaches, Horikawa and 
Kamitani! presented a brain decoding method via the 
computer vision principle, which represent the categories 
with a group of latent features through hierarchical pro- 
cessing. By this way, they found that the features of visu- 
al images can be predicted from brain activities of sub- 
jects. A model based on a deep neural network (DNN) 
was trained by Shen et al.2° to establish an end-to-end 
reconstruction model via visual stimuli images and the 
evoked brain activity patterns. Experimental results 
showed that a direct mapping can be learned by the pro- 
posed model for perceptual reconstruction. 

GAN is another relatively important model in the 
field of deep learning. The original GAN model is pro- 
posed by Goodfellow et al.!6%! in 2014, where the discrim- 
inator and generator play the following mini-max game: 


min max F850) =Feny 2, log,,, (y= 1) | + 


Esxpg [log,, (y = 0|2)| (3) 


where Pdata 1S the distribution of real data, p, is the 
distribution of generated data, and pg is the distribution 
of the discriminator with parameter 0, EF represents the 
mathe matical expression. The training stage of GAN can 
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be seen as a zero-sum game, in which the generator tries 





to generate the data that can fool the discriminator, and 





the discriminator is used to distinguish the generated fake 
data y from the real data x and label them with 1 and 0, 
respectively. 

Some GAN-based visual stimuli reconstruction mod- 
els have been proposed and greatly improved the preci- 
sion of the reconstruction results. For instance, St-Yves 
and Naselaris!”°] used GAN architecture to learn an im- 
age generation model and completed perceptual stimuli 
reconstruction through this model. And in this way, the 
noise model can be inferred from the measured brain 
activity. Furthermore, some approaches based on GANs 
are proposed to reconstruct human face images. Giicliitiirk 
et al.!"!] proposed a joint model to combine probabilistic 
inference with the GAN architecture for face stimuli re- 
construction from human brain activities. They maxim- 
ized posteriori estimation to invert the linear transforma- 
tion from features in latent space to brain activity pat- 
terns. Then, the convolutional neural networks (CNN) 
were used to invert a non-linear transformation from 
visual stimuli to latent features. Seeliger et al.!”] intro- 
duced a deep convolutional generative adversarial net- 
work (DCGAN) architecture to reconstruct the stimuli 
images. Also, they used a linear model to predict the lat- 
ent space of a generative model from the evoked brain 
activity patterns. More recently, VanRullen and Reddy!”3! 
presented thousands of celebrity face images of a large 
dataset to the subjects as a stimuli task. Then, they 
trained a VAE neural network using a GAN architecture 
over this dataset and learnt a linear mapping between 


-based 









Visual stimuli reconstruction methods 


Non-linear 


DBE DEORoOo 
DEES BRR 







face images and fMRI activity patterns. Compared with 
the classic linear reconstruction methods, models based 
on deep networks can implement non-linear transforma- 
tions that greatly improve the accuracy of image recon- 
struction, and describe images in fine granularity. Fig.6 





shows some experimental results of several visual stimuli 
reconstruction tasks in recent years. In addition to the 
visual stimuli reconstruction methods based on Bayesian 
or deep neural networks we mentioned above, consider- 
ing it is difficult to collect a large amount of pairwise 





image-f[MRI data for training, there are several meth- 





ods!"4-78] using semi-supervised learning (SSL) to improve 
brain decoding performance by leveraging large number 
of images. 


5 Open resources & future work 


5.1 Open resources 


As we all know, the collection of high-quality data- 
sets is an important guarantee for the research of data- 
driven machine learning methods. For the decoding of 
visual information from human brain activity, Open 
NEURO! project is a free and open platform for sharing 
MRI, MEG, EEG, iEEG and ECoG data. As an exten- 
ded version of Open fMRI project, the project now has 
404 available datasets and 12037 participants across all 
datasets. ‘Table 2 shows some datasets in Open NEURO 
project. 

In order to promote the rapid development of the field 





Miyawaki et al.'°'! (2008) 





Naselaris et al.!! (2009) 


., Nishimoto et al.'*! (2011) 
“| Thirion et al.'! (2006) 


a4 Cowen et al.!®! (2014) 


Zhan et al.!'°! (2013) 


‘Able Ty et al.'®! (2018) 
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Fig.6 A brief presentation of the results of some visual stimuli reconstruction methods in recent years 


‘http: //openneuro.org/ 
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Table 2. Dataset descriptions in Open NEURO Project 


ID Titles Subjects 
DSOO1 Balloon Analog Risk 16 
DS002D Deterministic classification 17 
DS002P Probabilistic classification 17 
DS005 Mixed-gambles 16 
DS011D Dual-task weather prediction 14 
DS011W Weather prediction without feedback 14 
DSO17 Selective stop signal task 8 
DS052R Reversal weather prediction 13 
DS052W Weather prediction 13 
DS102 Flanker task 26 
DS105 Visual object recognition 6 
DS107 Word and object processing 49 
DS116A Auditory odd ball 17 
DS116V Visual odd ball 17 
DS164 Stroop 28 
DS231 Integration of sweet taste 9 
DS232 Face-coding localizer (objects) task 10 
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TR: Repetition time in millisecond; ‘TE: Echo time in millisecond. 


of brain science and neural computing, more and more re- 
searchers have made their works open source online. 
These works progress mainly includes algorithms and the 
open-source Software packages. For example, Chen et 
al.l19, 32, 33) made their codes on brain pattern analysis 
available online, including some proposed algorithms and 
open-source libraries such as ScikKit-Learn for model 
training. What's more, some research groups developed 
open source software are for brain image analysis. One of 
the most famous examples is PyMVPA? developed by the 
Haxby Lab? at Dartmouth College. PyMVPA is an open 
source software toolbox based on Python, which is used 
for the application of analysis techniques based on classi- 
fiers to {MRI datasets. PyMVPA is a cross-platform tool- 
box that makes use of the abilities of Python to access 
the libraries which are written in various of program- 
ming languages and computing environments to interface 
with the wealth of existing machine learning packa- 
gesl89, 90], 

Recently, a new toolbox called easy fMRI‘ is de- 
veloped for analyzing fMRI datasets (shown in Fig. 7). 
Easy fMRI is a toolbox with the capability of decoding 
and visualizing the human brain. It is developed by the 
iBRAIN® research group of Nanjing University of Aero- 
nautics and Astronautics, which is free and open source. 
It is designed based on the brain imaging data structure 
(BIDS) file, which supports automatic labelling on the de- 


2http://www.pymvpa.org / 
http: //haxbylab.dartmouth.edu/ 
4https: //easyfmri.learningbymachine.com/ 


Shttp://ibrain.nuaa.edu.cn/ 


Categories Time points TR TE Ref 
4 894 2000 44 [79] 
2 356 2000 20 [80] 
2 356 2000 20 [80] 
4 714 2000 30 [81] 
3 408 2000 25 [82] 
4 236 2000 25 [82] 
6 546 2000 “25 [80] 
2 450 2000 20 [83] 
, 450 2000 20 [83] 
, 292 2000 20 [84] 
8 1 452 2 500 30 [38] 
4 322 2000 28 [85] 
y 510 2000 20 [86] 
° 510 2000 20 [86] 
2 370 1 500 10 [87] 
6 1119 2000 30 [88] 
4 760 1 060 16 [47] 
signed matrix. 
Easy fMRI uses advanced machine learning — tech- 


niques and high-performance computing to analyze task- 
based fMRI datasets. It provides a friendly graphical user 
interface for feature analysis, HA, MVPA, RSA, etc. In 
addition, easy {MRI is integrated with FMRIB Sofware 
Library (for the preprocessing step), SciKit-Learn (for 
model analysis), PyTorch (for deep learning methods), 
and AFNI/SUMA (for 3D visualization). ANFI repres- 
ents analysis of functional neuroimages. SUMA allows 
viewing 3D cortical surface model and mapping volumet- 
ric data onto them. 


5.2 Future work 


In this paper, we reviewed the methods developed and 
employed for the decoding of visual information from hu- 





man brain activity. In future studies, there are several is- 
sues needing to be addressed. For instance, task-based 
fMRI is difficult to collect due to the difficulty in keep- 
ing subjects’ heads stationary. ‘Therefore, the sample sizes 
of most task-based fMRI datasets are small. Some 





studies|9!-93] are proposed to process them by applying do- 
main adaptation and transfer learning algorithms. In or- 
der to make the brain decoding algorithms available to 
large scale and multi-site fMRI datasets, this is an im- 
portant issue and needs more studies. 

Furthermore, the three aspects we mentioned above, 
i.e., brain image alignment, brain decoding and brain im- 
age reconstruction are usually studied independently. In 
the future, we will consider combining them together to 
deal with more complex real-world problems. For ex- 
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Fig. 7 A screenshot of easy {MRI toolbox 


ample, Du et al.!8! mentioned that in the visual stimuli re- 
construction task, the reconstruction results of different 
subjects were significantly different. To solve this prob- 
lem, we can combine HA and the reconstruction task to 
reduce reconstruction differences across subjects. 

Finally, most of the current methods do not make 
good use of the structural information of the whole brain 
structure data. In future studies, we plan to develop in- 
formation-based models on the basis of understanding the 
intrinsic information of the whole brain structure data to 
smooth the data information of small areas. It makes the 
information valid area in the whole brain data clearer and 
provides better input information for subsequent feature 
selection and representation similarity analysis. 


6 Conclusions 





In this paper, we have reviewed the mechanisms and 
the strategies of machine learning methods for analyzing 
neural activities via fMRI data. As an interdisciplinary 
field of research, computational neuroscience can break 
the neural codes via different concepts from different sub- 
jects such as mathematics, psychology, machine learning, 
etc. However, there are still some challenges in the field 
of fMRI research such as multi-subject datasets, high-di- 
mensional feature analysis and the generation of visual 
images from fMRI. We conducted a brief review on the 
state-of-the-art machine learning techniques for solving 
these challenges, including linear and nonlinear function- 
al alignment, multi-voxel pattern analysis, representation 
similarity analysis and visual stimuli reconstruction based 
on Bayesian or deep neural networks. Last but not least, 
we also provided online resources and open research prob- 
lems on brain pattern analysis for the convenience of fu- 
ture research, and put forward some ideas for future work 
in the field of brain science and neural computing. 
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